AITopics | Thanh Hóa Province

Collaborating Authors

Thanh Hóa Province

A Big Data-empowered System for Real-time Detection of Regional Discriminatory Comments on Vietnamese Social Media

Huynh, An Nghiep, Do, Thanh Dat, Do, Trong Hop

arXiv.org Artificial IntelligenceOct-29-2024

Regional discrimination is a persistent social issue in Vietnam. While existing research has explored hate speech in the Vietnamese language, the specific issue of regional discrimination remains under-addressed. Previous studies primarily focused on model development without considering practical system implementation. In this work, we propose a task called Detection of Regional Discriminatory Comments on Vietnamese Social Media, leveraging the power of machine learning and transfer learning models. We have built the ViRDC (Vietnamese Regional Discrimination Comments) dataset, which contains comments from social media platforms, providing a valuable resource for further research and development. Our approach integrates streaming capabilities to process real-time data from social media networks, ensuring the system's scalability and responsiveness. We developed the system on the Apache Spark framework to efficiently handle increasing data inputs during streaming. Our system offers a comprehensive solution for the real-time detection of regional discrimination in Vietnam.

data mining, machine learning, real time system, (18 more...)

arXiv.org Artificial Intelligence

2411.02587

Country:

Asia > Vietnam > Thanh Hóa Province > Thanh Hóa (0.05)
Oceania > Australia (0.04)
North America > United States > California (0.04)
(4 more...)

Genre: Research Report (0.86)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.55)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges

Van Dinh, Nguyen, Dang, Thanh Chi, Nguyen, Luan Thanh, Van Nguyen, Kiet

arXiv.org Artificial IntelligenceOct-4-2024

Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.

dataset, dialect, experiment, (17 more...)

arXiv.org Artificial Intelligence

2410.03458

Country:

Asia > Vietnam > Hanoi > Hanoi (0.14)
Asia > Vietnam > Thanh Hóa Province > Thanh Hóa (0.04)
Asia > Vietnam > Hưng Yên Province > Hưng Yên (0.04)
(65 more...)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Explainable Artificial Intelligence for Drug Discovery and Development -- A Comprehensive Survey

Alizadehsani, Roohallah, Oyelere, Solomon Sunday, Hussain, Sadiq, Calixto, Rene Ripardo, de Albuquerque, Victor Hugo C., Roshanzamir, Mohamad, Rahouti, Mohamed, Jagatheesaperumal, Senthil Kumar

arXiv.org Artificial IntelligenceNov-2-2023

The field of drug discovery has experienced a remarkable transformation with the advent of artificial intelligence (AI) and machine learning (ML) technologies. However, as these AI and ML models are becoming more complex, there is a growing need for transparency and interpretability of the models. Explainable Artificial Intelligence (XAI) is a novel approach that addresses this issue and provides a more interpretable understanding of the predictions made by machine learning models. In recent years, there has been an increasing interest in the application of XAI techniques to drug discovery. This review article provides a comprehensive overview of the current state-of-the-art in XAI for drug discovery, including various XAI methods, their application in drug discovery, and the challenges and limitations of XAI techniques in drug discovery. The article also covers the application of XAI in drug discovery, including target identification, compound design, and toxicity prediction. Furthermore, the article suggests potential future research directions for the application of XAI in drug discovery. The aim of this review article is to provide a comprehensive understanding of the current state of XAI in drug discovery and its potential to transform the field.

drug discovery, prediction, xai model, (9 more...)

arXiv.org Artificial Intelligence

2309.12177

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
South America > Brazil > Ceará > Fortaleza (0.04)
Asia > India > Assam > Dibrugarh District > Dibrugarh (0.04)
(8 more...)

Genre:

Research Report > Promising Solution (0.66)
Overview > Innovation (0.66)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Multi-modality fusion using canonical correlation analysis methods: Application in breast cancer survival prediction from histology and genomics

Subramanian, Vaishnavi, Syeda-Mahmood, Tanveer, Do, Minh N.

arXiv.org Artificial IntelligenceNov-27-2021

The availability of multi-modality datasets provides a unique opportunity to characterize the same object of interest using multiple viewpoints more comprehensively. In this work, we investigate the use of canonical correlation analysis (CCA) and penalized variants of CCA (pCCA) for the fusion of two modalities. We study a simple graphical model for the generation of two-modality data. We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction. Penalized extensions of CCA (pCCA) that incorporate domain knowledge can discover correlations with high-dimensional, low-sample data, whereas traditional CCA is inapplicable. To facilitate the generation of multi-dimensional embeddings with pCCA, we propose two matrix deflation schemes that enforce desirable properties exhibited by CCA. We propose a two-stage prediction pipeline using pCCA embeddings generated with deflation for latent variable prediction by combining all the above. On simulated data, our proposed model drastically reduces the mean-squared error in latent variable prediction. When applied to publicly available histopathology data and RNA-sequencing data from The Cancer Genome Atlas (TCGA) breast cancer patients, our model can outperform principal components analysis (PCA) embeddings of the same dimension in survival prediction.

data mining, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2111.13987

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.61)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback